City trees are important: they purify the air, reduce heat islands, help regulate the water cycle and provide immense health benefits.Trees play an important role in increasing urban biodiversity, providing plants and animals with a favourable habitat, food and protection. A mature tree absorbs greater CO2 per year. As a result, trees play an important role in climate change mitigation. Especially in cities with high levels of pollution, trees can improve air quality, making cities healthier places to live in.
Large trees are excellent filters for urban pollutants and fine particulates. They absorb pollutant gases (such as carbon monoxide, nitrogen oxides, ozone and sulfer oxides) and filter dust, dirt or smoke out of the air by trapping them on leaves and bark. Living in close proximity of urban green spaces and having access to them can improve physical and mental health. This, in turn, contributes to the well-being of urban communities.
Trees also help to reduce carbon emissions by helping to conserve energy. For example, the correct placement of trees around buildings can reduce the need for air conditioning, and reduce winter heating bills. Not to mention planning urban landscapes with trees can increase property value, and attract tourism and businesses.
# Importing in required libraries
import pandas as pd
import altair as alt
alt.data_transformers.enable('default', max_rows=1000000)
import json
Importing data, dropping columns that will not be helpful to answer questions stated in the introduction. Converting date_plated column into year, month, and day. Dropping all rows that have a missing value.
# Importing the data
url = 'https://raw.githubusercontent.com/UBC-MDS/data_viz_wrangled/main/data/Trees_data_sets/small_vancouver_trees.csv'
df = pd.read_csv(url, parse_dates=['date_planted'])
# Dropping columns that are not going to be used in analysis or helpful in answering questions as stated in introduction
df = df.drop(columns=['std_street','on_street', 'civic_number', 'tree_id' , 'cultivar_name', 'genus_name', 'assigned', 'plant_area', 'common_name' ,'on_street_block','root_barrier'])
# Converting date_planted column into year, month, data separate columns.
datetimes = pd.to_datetime(df['date_planted'])
df[['year','month','day']] = datetimes.dt.date.astype(str).str.split('-',expand=True)
# Removing all the rows that are missing values to ensure analysis doesn't add any unnecessary bias
df = df.dropna()
# Adding a column in the dataset that will determine height and diameter ratio of a tree
df = df.assign(height_diameter_ratio = df['height_range_id']/df['diameter'])
df.head()
| Unnamed: 0 | species_name | neighbourhood_name | date_planted | diameter | street_side_name | curb | height_range_id | latitude | longitude | year | month | day | height_diameter_ratio | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 9 | 13029 | GRANDIFLORA X | Renfrew-Collingwood | 2013-01-21 | 3.00 | ODD | N | 1 | 49.250114 | -123.039156 | 2013 | 01 | 21 | 0.333333 |
| 10 | 14062 | ROBUR | Kitsilano | 1995-03-15 | 13.00 | EVEN | Y | 3 | 49.259133 | -123.155318 | 1995 | 03 | 15 | 0.230769 |
| 12 | 3515 | SYLVATICA | Renfrew-Collingwood | 2001-05-01 | 3.00 | ODD | Y | 1 | 49.241922 | -123.046271 | 2001 | 05 | 01 | 0.333333 |
| 16 | 14533 | PENNSYLVANICA | Hastings-Sunrise | 2003-01-06 | 8.00 | ODD | Y | 2 | 49.262000 | -123.036142 | 2003 | 01 | 06 | 0.250000 |
| 18 | 13410 | KOUSA | Marpole | 1993-11-29 | 6.25 | ODD | Y | 2 | 49.211428 | -123.125269 | 1993 | 11 | 29 | 0.320000 |
df.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 2338 entries, 9 to 4999 Data columns (total 14 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Unnamed: 0 2338 non-null int64 1 species_name 2338 non-null object 2 neighbourhood_name 2338 non-null object 3 date_planted 2338 non-null datetime64[ns] 4 diameter 2338 non-null float64 5 street_side_name 2338 non-null object 6 curb 2338 non-null object 7 height_range_id 2338 non-null int64 8 latitude 2338 non-null float64 9 longitude 2338 non-null float64 10 year 2338 non-null object 11 month 2338 non-null object 12 day 2338 non-null object 13 height_diameter_ratio 2338 non-null float64 dtypes: datetime64[ns](1), float64(4), int64(2), object(7) memory usage: 274.0+ KB
df.describe(include='all')
<ipython-input-4-174ba9bf1a5c>:1: FutureWarning: Treating datetime data as categorical rather than numeric in `.describe` is deprecated and will be removed in a future version of pandas. Specify `datetime_is_numeric=True` to silence this warning and adopt the future behavior now. df.describe(include='all')
| Unnamed: 0 | species_name | neighbourhood_name | date_planted | diameter | street_side_name | curb | height_range_id | latitude | longitude | year | month | day | height_diameter_ratio | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 2338.000000 | 2338 | 2338 | 2338 | 2338.000000 | 2338 | 2338 | 2338.000000 | 2338.000000 | 2338.000000 | 2338 | 2338 | 2338 | 2338.000000 |
| unique | NaN | 109 | 22 | 1466 | NaN | 3 | 2 | NaN | NaN | NaN | 31 | 11 | 31 | NaN |
| top | NaN | PLATANOIDES | Renfrew-Collingwood | 2006-11-21 00:00:00 | NaN | ODD | Y | NaN | NaN | NaN | 1998 | 02 | 04 | NaN |
| freq | NaN | 192 | 230 | 8 | NaN | 1173 | 2163 | NaN | NaN | NaN | 127 | 447 | 97 | NaN |
| first | NaN | NaN | NaN | 1989-11-15 00:00:00 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| last | NaN | NaN | NaN | 2019-04-16 00:00:00 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| mean | 10232.198033 | NaN | NaN | NaN | 6.231801 | NaN | NaN | 1.792130 | 49.246617 | -123.098752 | NaN | NaN | NaN | 0.341225 |
| std | 5880.952995 | NaN | NaN | NaN | 4.351155 | NaN | NaN | 0.945447 | 0.021065 | 0.048890 | NaN | NaN | NaN | 0.197366 |
| min | 10.000000 | NaN | NaN | NaN | 0.500000 | NaN | NaN | 0.000000 | 49.201366 | -123.223440 | NaN | NaN | NaN | 0.000000 |
| 25% | 4905.250000 | NaN | NaN | NaN | 3.000000 | NaN | NaN | 1.000000 | 49.229326 | -123.136656 | NaN | NaN | NaN | 0.250000 |
| 50% | 10546.000000 | NaN | NaN | NaN | 5.000000 | NaN | NaN | 2.000000 | 49.246503 | -123.092832 | NaN | NaN | NaN | 0.333333 |
| 75% | 15386.750000 | NaN | NaN | NaN | 8.000000 | NaN | NaN | 2.000000 | 49.262892 | -123.057433 | NaN | NaN | NaN | 0.352941 |
| max | 19997.000000 | NaN | NaN | NaN | 52.000000 | NaN | NaN | 7.000000 | 49.293881 | -123.022469 | NaN | NaN | NaN | 4.000000 |
We are using about ~2300 datapoints in this analysis. Dataset contains 109 unique species and 22 unique neighbourhoods in Vancouver. Dataset also contains information about when a tree was planted, and its associated diameter, height, genus name, which side of the street it is planted along with various other miscellaneous information. Latitude and longitude of the tree are also provided. All the null values in the dataset have been removed to enable conducting adequate data analysis. Additional columns were added to the original dataset. New columns are height to diameter ratio of a tree and date_planted column being split into year, month, and day.
To explore answer to the first question, I will be using the columns year, height_range_id, diameter and height_diameter_ratio, latitude and longitude. Information from these columns will help evalute relationship between height and diameter of the tree with its age (planted date).
To explore answer to the second question, I will be using the columns species_name, neighbourhood_name, latitude and longitude. Information from these columns will help evaluate species distribution across the city of Vancouver.
click = alt.selection_multi()
chart1 = (alt.Chart(df).mark_bar().encode(
alt.X('year', title='Year'),
alt.Y('count()', title='Total Trees Planted in the Year', sort='x'),
alt.Color('year', title="Year"),
opacity=alt.condition(click, alt.value(0.9), alt.value(0.2)))
.add_selection(click)).properties(width=400, title = "Figure 1: Total Trees Planted from 1989 to 2019 across Vancouver")
chart1
Figure 1 above shows that generally 80+ trees were planted from the year 1995 to 2013. Outside these dates, there has been significant cut in number of trees planted. In most recent years, the city has planted less than 25 trees each year.
click = alt.selection_multi()
chart2 = (alt.Chart(df).mark_bar().encode(
alt.X('year', title='Year'),
alt.Y('mean(height_range_id)', title='Mean height of trees', sort='x'),
alt.Color('year', title="Year"),
opacity=alt.condition(click, alt.value(0.9), alt.value(0.2)))
.add_selection(click)).properties(width=400, title = 'Figure 2: Mean Height of Trees Planted from 1989 to 2019')
chart2
Figure 2 shows us that older the tree, greater its height on average. This makes sense considering plants tend to grow taller over a longer duration of time. Considering that an unequal number of trees were planted each year, we do have some discrepancies. However, overall we can say with age, trees tend to be taller.
click = alt.selection_multi()
chart3 = (alt.Chart(df).mark_bar().encode(
alt.X('year', title='Year'),
alt.Y('mean(diameter)', title='Mean diameter of the tree', sort='x'),
alt.Color('year', title="Year"),
opacity=alt.condition(click, alt.value(0.9), alt.value(0.2)))
.add_selection(click)).properties(width=400, title = 'Figure 3: Mean Diameter of Trees Planted from 1989 to 2019')
chart3
Figure 3 shows us that older the tree, greater its diameter on average. This makes sense considering plants tend to grow wider over a longer duration of time. Considering that an unequal number of trees were planted each year, we do have some discrepancies. However, overall we can say with age, trees tend to be wider.
click = alt.selection_multi()
chart4 = (alt.Chart(df).mark_bar().encode(
alt.X('year', title='Year'),
alt.Y('mean(height_diameter_ratio)', title='Mean height to diameter ratio of the tree', sort='x'),
alt.Color('year', title="Year"),
opacity=alt.condition(click, alt.value(0.9), alt.value(0.2)))
.add_selection(click)).properties(width=400, title = 'Figure 4: Mean Height:Diameter of Trees Planted from 1989 to 2019')
chart4
Figure 4 shows us that generally height to diameter ratio is more or less consistent across the years. We can therefore assume most tree species grow taller and wider at a similar rate. Average range of height to diameter ratio is between 0.3 - 0.35.
# Next, combining Figures 1 to 4 in a specific layout.
# Adding a column selection such that selecting one year will highlight bars of the same year across all the charts.
click = alt.selection_multi()
chart1 = (alt.Chart(df).mark_bar().encode(
alt.X('year', title='Year'),
alt.Y('count()', title='Total Trees Planted in the Year', sort='x'),
alt.Color('year', title="Year"),
opacity=alt.condition(click, alt.value(0.9), alt.value(0.2)))
.add_selection(click)).properties(width=400, title = "Figure 1: Total Trees Planted from 1989 to 2019 across Vancouver")
chart2 = (alt.Chart(df).mark_bar().encode(
alt.X('year', title='Year'),
alt.Y('mean(height_range_id)', title='Mean height of trees', sort='x'),
alt.Color('year', title="Year"),
opacity=alt.condition(click, alt.value(0.9), alt.value(0.2)))
.add_selection(click)).properties(width=400, title = 'Figure 2: Mean Height of Trees Planted from 1989 to 2019')
chart3 = (alt.Chart(df).mark_bar().encode(
alt.X('year', title='Year'),
alt.Y('mean(diameter)', title='Mean diameter of the tree', sort='x'),
alt.Color('year', title="Year"),
opacity=alt.condition(click, alt.value(0.9), alt.value(0.2)))
.add_selection(click)).properties(width=400, title = 'Figure 3: Mean Diameter of Trees Planted from 1989 to 2019')
chart4 = (alt.Chart(df).mark_bar().encode(
alt.X('year', title='Year'),
alt.Y('mean(height_diameter_ratio)', title='Mean height to diameter ratio of the tree', sort='x'),
alt.Color('year', title="Year"),
opacity=alt.condition(click, alt.value(0.9), alt.value(0.2)))
.add_selection(click)).properties(width=400, title = 'Figure 4: Mean Height:Diameter of Trees Planted from 1989 to 2019')
combined = (chart1) & (chart2 | chart3) & (chart4)
# Plots with slider filter
# A slider filter 1
# This plots diameter of a tree vs year it was planted. Slider allows to explore data at increment of 5.
slider = alt.binding_range(min=0, max=60, step=5, name='Diameter')
selector = alt.selection_single(name="SelectorName", fields=['diameter'],
bind=slider, init={'diameter': 0})
filter_year2 = alt.Chart(df).mark_point().encode(
x=alt.X('year', title='Year'),
y=alt.Y('diameter', title='Diameter'),
color=alt.condition(
alt.datum.diameter < selector.diameter,
alt.value('red'), alt.value('blue')
)
).add_selection(
selector
).properties(width=400, title = 'Figure 5: Diameter of Trees Planted from 1989 to 2019')
# A slider filter 2
# This plots height of a tree vs year it was planted. Slider allows to explore data at increment of 0.5.
slider = alt.binding_range(min=0, max=8, step=0.5, name='Height')
selector = alt.selection_single(name="SelectorName", fields=['height_range_id'],
bind=slider, init={'height_range_id': 0})
filter_year3 = alt.Chart(df).mark_point().encode(
x=alt.X('year', title='Year'),
y=alt.Y('height_range_id', title='Height'),
color=alt.condition(
alt.datum.height_range_id < selector.height_range_id,
alt.value('red'), alt.value('blue')
)
).add_selection(
selector
).properties(width=400, title = 'Figure 6: Height of Trees Planted from 1989 to 2019')
# Layout for slider 1 and 2 plots
points_combined = filter_year2 | filter_year3
points_combined
Figures 5 and 6 shows diameter and height of trees planted from 1989 to 2019, respectively. They allow us to understand the data more effectively, in case averages used in the previous figures were not adequate. From the data, we can clearly see outliers and this helps us justify why averages from some years did not follow the trend. For example, data points from 1998 are more sparsed and have clear outliers which led to have averages that are slightly higher than anticipated. However, for the purpose of this analysis we will not be removing any outliers from the dataset.
Next, we will explore distribution of height and diameter of the trees across various Vancouver neighbourhoods.
# Import relevant data
url_geojson = 'https://raw.githubusercontent.com/UBC-MDS/exploratory-data-viz/main/data/local-area-boundary.geojson'
data_geojson_remote = alt.Data(url=url_geojson, format=alt.DataFormat(property='features',type='json'))
# Create Map of Vancouver
vancouver_map = alt.Chart(data_geojson_remote).mark_geoshape(
color = 'gray', opacity= 0.5, stroke='white').encode(
).project(type='identity', reflectY=True)
# Filter Relevant Dataset
median_df = df.groupby('neighbourhood_name'
).median().reset_index(
).rename(columns={'neighbourhood_name':'name'})[['name',
'diameter',
'latitude',
'longitude']]
# Add ability to explore the data via hovering over the map
hover = alt.selection_single(fields=['name'], on='mouseover')
# Map to show tree diameter distrubution across Vancouver
chart5 = alt.Chart(data_geojson_remote).mark_geoshape().transform_lookup(
lookup='properties.name',
from_=alt.LookupData(median_df, 'name', ['diameter', 'name'])).encode(
color=alt.Color('diameter:Q', title='Diameter'),
opacity=alt.condition(hover, alt.value(1),alt.value(0.4)),
tooltip=['name:N', alt.Tooltip('diameter:Q', title='Diameter')]).project(type='identity', reflectY=True).properties(title='Map 1: Diameter of Trees across Vancouver Neighbourhood').add_selection(hover)
median_dfs = df.groupby('neighbourhood_name'
).median().reset_index(
).rename(columns={'neighbourhood_name':'name'})[['name',
'height_range_id',
'latitude',
'longitude']]
# Map to show tree height distribution across Vancouver
chart6 = alt.Chart(data_geojson_remote).mark_geoshape().transform_lookup(
lookup='properties.name',
from_=alt.LookupData(median_dfs, 'name', ['height_range_id', 'name'])).encode(
color=alt.Color('height_range_id:Q', title = 'Height'),
opacity=alt.condition(hover, alt.value(1),alt.value(0.4)),
tooltip=['name:N', alt.Tooltip('height_range_id:Q', title='Height')]).project(type='identity', reflectY=True).properties(title='Map 2: Height of Trees across Vancouver Neighbourhood').add_selection(hover)
chart6
# Layout to effectively combine both charts
map_combined = chart5 | chart6
map1 = map_combined.resolve_scale(color='independent')
map1
Previously, we found higher diameter and heights are associated with older trees. Hasting-Sunrise neighbourhood has the greatest number of older trees. Distribution of age of the tree is quite uneven across the city. However, generally the northern side of the city has bigger or more older trees than southern.
#Interim Layout for Question # 1
Layout_diameter_height = (map1 & combined) & (filter_year2 | filter_year3)
Layout_diameter_height
# Since there are 100+ species in the dataset, we will only look at top 10 most popular species found in Vancouver area.
dt=df.groupby('species_name').count()
#Find top 10 species
top_10 = dt.sort_values(by='Unnamed: 0', ascending=False).head(10)
#make a table that only contains top 10
top = df.query('species_name in ["SERRULATA","PLATANOIDES","CERASIFERA", "RUBRUM", "SYLVATICA", "AMERICANA","EUCHLORA X", "BETULUS","CAMPESTRE","FREEMANI X"]')
top.head()
| Unnamed: 0 | species_name | neighbourhood_name | date_planted | diameter | street_side_name | curb | height_range_id | latitude | longitude | year | month | day | height_diameter_ratio | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 12 | 3515 | SYLVATICA | Renfrew-Collingwood | 2001-05-01 | 3.0 | ODD | Y | 1 | 49.241922 | -123.046271 | 2001 | 05 | 01 | 0.333333 |
| 30 | 14946 | CAMPESTRE | Dunbar-Southlands | 2002-03-28 | 5.0 | EVEN | Y | 3 | 49.239460 | -123.181660 | 2002 | 03 | 28 | 0.600000 |
| 31 | 2503 | CAMPESTRE | Grandview-Woodland | 1989-11-24 | 8.0 | EVEN | Y | 3 | 49.272019 | -123.060910 | 1989 | 11 | 24 | 0.375000 |
| 34 | 1782 | FREEMANI X | Hastings-Sunrise | 2006-11-21 | 4.0 | MED | Y | 2 | 49.269494 | -123.035760 | 2006 | 11 | 21 | 0.500000 |
| 43 | 13617 | RUBRUM | Victoria-Fraserview | 1996-11-07 | 7.5 | EVEN | Y | 2 | 49.212181 | -123.058075 | 1996 | 11 | 07 | 0.266667 |
# Create heat map; neghbourhood vs species
click = alt.selection_multi()
chart7 = (alt.Chart(top).mark_bar().encode(
alt.X('count()', title='Number of Trees'),
alt.Y('species_name', title='Species Name', sort='x'),
alt.Color('species_name', title="Species Name"),
opacity=alt.condition(click, alt.value(0.9), alt.value(0.2)))
.add_selection(click)).properties(width=400, title= 'Figure 7: Top 10 Species')
chart7
Figure 7 shows following as the most popular tree species being planted across Vancouver: Platanoides, Rubrum, Slyvatica, Cerasifera, Campsestre, Betulus, Freemani X, Americana, Serrulata, Euchlora X. From the total of 107 unique species, above top 10 make up most of the species.
heatmap1 = alt.Chart(top).mark_rect().encode(
alt.Color('count()'),
alt.X('species_name', title='Species Name'),
alt.Y('neighbourhood_name', sort='color', title='Neighbourhood')).properties(width=400, title = 'Figure 8: Heat Map of Residence of Popular Species in Vancouver Neighbourhood')
heatmap1
#A dropdown filter
combined2 = (chart7 & heatmap1)
combined2
species = ["SERRULATA","PLATANOIDES","CERASIFERA", "RUBRUM", "SYLVATICA", "AMERICANA","EUCHLORA X", "BETULUS","CAMPESTRE","FREEMANI X"]
neighbourhood = sorted(top['neighbourhood_name'].unique())
species_dropdown = alt.binding_select(options=species)
neighbourhood_dropdown = alt.binding_select(options=neighbourhood)
species_select = alt.selection_single(fields=['species_name'], bind=species_dropdown, name="species_name")
neighbourhood_select = alt.selection_single(fields=['neighbourhood_name'], bind=neighbourhood_dropdown, name="neighbourhood_name")
filter_species = combined2.add_selection(species_select).transform_filter(species_select)
filter_species2 = filter_species.add_selection(neighbourhood_select).transform_filter(neighbourhood_select)
filter_species2
panel_layout = Layout_diameter_height & filter_species2
Description of the dashboard panel below.
Map of neighbourhoods in Vancouver that shows average diameter and height of the trees. Both are interactive maps (through hovering over neighbourhoods). Addtional tooltip interactions provide more information about name of the neighbourhood and information on average diameter and height of the trees.
Figure 1 shows total trees planted for the each year. Figures 2 and 3 shows average tree height and diameter for each of the plantation years. Figure 4 shows average height to diameter ratio for each year. All 4 plots are interactive through selection of a year.
Figures 5 and 6 shows more information about diameter and height of the trees for every datapoint available. Height and diameter slider widget are included to help explore the data. As slider value increases, colours on the plot will change to red from blue to help keep track and better visualize the data.
Figure 7 shows top 10 species found in the Vancouver area. This is an interactive bar plot. Heat map, or Figure 8 shows number of popular species found in the Vancouver neighbourhood. Both Figure 7 and 8 are interactive through a widget. There are two filter widgets. Widget for the species selection will filter both the figures 7 and 8. Filtering on the neighbourhood widget tool will enable to zoom into the total number of trees that satisfy both the widget criteria.
In total we have 2 interactive maps, 4 interactive bar plots, 2 scatter plots interacted by slider widget, and heat map and bar plot that interact with each other through dropdown widget.
panel_layout
Trees now have a fundamental place in many big cities around the world. Large trees are excellent filters for urban pollutants and fine particulates. They absorb pollutant gases and filter dust, dirt or smoke out of the air by trapping them on leaves and bark. Living in close proximity of urban green spaces and having access to them can improve physical and mental health. This, in turn, contributes to the well-being of urban communities. Trees also help to reduce carbon emissions by helping to conserve energy, can increase property value, and attract tourism and business.
Given these motivation, I was using 'Vancouver trees' dataset to evaluate relationship between height, diameter of the tree with its age. Also, I was looking to understand distribution of the tree species, age, height, and diameter across Vancouver.
I was using about ~2300 datapoints; dataset contained 109 unique species and 22 unique neighbourhood in Vancouver. All the null values in the dataset were removed to conduct adequate data analysis.
From various visualizations, following was determined:
Trees are important part of urbanization. Given that there are numerous benefits of having trees in a neighbourhood, it is important for cities to keep evaluating tree biodiversity and keep up with new plantations as required. Benefits of having older trees are especially important. Typically, older trees have greater height and diameter measures. However, we can look into investing in a unique tree species that will grow quicker than others. The most popular tree species in Vancouver is Platanoides; they are most popular in Hasting-Sunrise neighbourhood.
In future, it would be interesting to evaluate the data to answer following additional questions:
Not all the work in this notebook is original. Parts that were borrowed from other resources are as follows: